Precision vs Confidence Tradeoffs for ℓ2-Based Frequency Estimation in Data Streams
نویسنده
چکیده
We consider the data stream model where an n-dimensional vector x is updated coordinate-wise by a stream of updates. The frequency estimation problem is to process the stream in a single pass and using small memory such that an estimate for xi for any i can be retrieved. We present the first algorithms for `2-based frequency estimation that exhibit a tradeoff between the precision (additive error) of its estimate and the confidence on that estimate, for a range of parameter values. We show that our algorithms are optimal for a range of parameters for the class of matrix algorithms, namely, those whose state corresponding to a vector x can be represented as Ax for some m × n matrix A. All known algorithms for `2-based frequency estimation are matrix algorithms.
منابع مشابه
Layered space-time equalization for wireless MIMO systems
In this paper we investigate layered space-time equalization (LSTE) architectures for multiple-input-multiple-output (MIMO) frequency selective channels. At each layer or stage of detection, a MIMO delayed decision feedback sequence estimator (MIMO-DDFSE) is used to tentatively detect a group of selected data streams, among which a sub-group of data streams are output and are canceled from the ...
متن کاملAn Efficient RFID Data Cleaning Method Based on Wavelet Density Estimation
A large number of noise are usually carried in the original RFID data and need to be cleaned up before further processing. Outlier detection is an effective method for RFID data cleaning. In this paper, a point probability data model was proposed to describe the uncertain RFID data streams. The wavelet density threshold was incorporated in this method to adaptively detect the outliers in the sl...
متن کاملCoordinate Descent Algorithms for Lasso Penalized Regression
Imposition of a lasso penalty shrinks parameter estimates toward zero and performs continuous model selection. Lasso penalized regression is capable of handling linear regression problems where the number of predictors far exceeds the number of cases. This paper tests two exceptionally fast algorithms for estimating regression coefficients with a lasso penalty. The previously known ℓ2 algorithm...
متن کاملEfficient Distributed Precision Control in Symmetric Replication Environments
Maintaining strict consistency of replicated data can be prohibitively expensive for many distributed applications and environments. In order to alleviate this problem, some systems allow applications to access stale, imprecise data. Due to relaxed correctness requirements, many applications can tolerate stale data but require that the imprecision be properly bounded. This paper describes ReBou...
متن کاملEstimating Data Stream Quality for Object-Detection Applications
Object-detection applications rely on streams of data gathered from sensors, RFID readers, and image recognition systems, among others. These raw data streams tend to be noisy, including both false positives (erroneous readings) and false negatives (missed readings). Techniques exist for general-purpose cleaning of these types of data streams, based on temporal and/or spatial correlations, as w...
متن کامل